## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## [1] 4898 13
## 'data.frame': 4898 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## X fixed.acidity volatile.acidity citric.acid
## Min. : 1 Min. : 3.800 Min. :0.0800 Min. :0.0000
## 1st Qu.:1225 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700
## Median :2450 Median : 6.800 Median :0.2600 Median :0.3200
## Mean :2450 Mean : 6.855 Mean :0.2782 Mean :0.3342
## 3rd Qu.:3674 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900
## Max. :4898 Max. :14.200 Max. :1.1000 Max. :1.6600
## residual.sugar chlorides free.sulfur.dioxide
## Min. : 0.600 Min. :0.00900 Min. : 2.00
## 1st Qu.: 1.700 1st Qu.:0.03600 1st Qu.: 23.00
## Median : 5.200 Median :0.04300 Median : 34.00
## Mean : 6.391 Mean :0.04577 Mean : 35.31
## 3rd Qu.: 9.900 3rd Qu.:0.05000 3rd Qu.: 46.00
## Max. :65.800 Max. :0.34600 Max. :289.00
## total.sulfur.dioxide density pH sulphates
## Min. : 9.0 Min. :0.9871 Min. :2.720 Min. :0.2200
## 1st Qu.:108.0 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100
## Median :134.0 Median :0.9937 Median :3.180 Median :0.4700
## Mean :138.4 Mean :0.9940 Mean :3.188 Mean :0.4898
## 3rd Qu.:167.0 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500
## Max. :440.0 Max. :1.0390 Max. :3.820 Max. :1.0800
## alcohol quality
## Min. : 8.00 Min. :3.000
## 1st Qu.: 9.50 1st Qu.:5.000
## Median :10.40 Median :6.000
## Mean :10.51 Mean :5.878
## 3rd Qu.:11.40 3rd Qu.:6.000
## Max. :14.20 Max. :9.000
Most wines’ quality are around 6.
Most wines’ fixed.acidity are around 7.
The volatile.acidity data is long tail data. I use log10 to transform the volatile.acidity data, which appears unimodal with the volatile.acidity peaking around 0.4 or so.
A majority of wines’ citric.acid are about 0.25. They are found in small quantities, which can add ‘freshness’ and flavor to wines.
The residual.sugar data is long tail data. I use log10 to transform the residual.sugar data, which appears bimodal with the residual.sugar peaking around 2 or so and again at 9 or so.
The chlorides data is long tail data. I use log10 to transform the chlorides data, which appears unimodal with the chlorides peaking around 0.07 or so.
Free.sulfur.dioxide is skewed to the left. Most wines has free.sulfur.dioxide of about 30.
Most wines have a total.sulfur.dioxide between 100 mg/dm^3 and 150 mg/dm^3: median 134.0 mg/dm^3 and mean 138.4 mg/dm^3.
Most wines have a density between 0.992 g/cm^3 and 0.995 g/cm^3: median 0.9937 g/cm^3 and mean 0.994 g/cm^3.
Most wines have a pH between 3.1 and 3.2: median 3.180 and mean 3.188.
Most wines have a alcohol value between 9 and 11.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
There are 4898 diamonds in the dataset with 13 features (X, fixed.acidity, volatile.acidity, citric.acid, residual.sugar, chlorides, free.sulfur.dioxide, total.sulfur.dioxide, density, pH, sulphates, alcohol and quality).
Other observations:
The main feature in the data set is quality. I’d like to determine which features are best for predicting the quality of wines. I suspect alcohol and some combination of the other variables can be used to build a predictive model to wine quality.
Fixed.acidity, volatile.acidity, chlorides, total.sulfur.dioxide, density, pH and alcohol are likely contribute to the quality of wines. I think alcohol and density probably contribute most to the quality after researching information on wine quality.
When free SO2 concentrations are over 50 ppm, SO2 becomes evident in the nose and taste of wine. I created a variable to show whether free sulfur dioxide is over 50ppm. Current free sulfur dioxide value minus 50 is the free sulfur dioxide over 50ppm.
I want to find whether the bound sulfur dioxide has impact on wine quality, so I also created a variable for the bound sulfur dioxide of wines. Since total sulfur dioxide is the amount of free and bound forms of S02, the difference of total sulfur dioxide and free sulfur dioxide is the bound sulfur dioxide.
I log-transformed the left skewed residual.sugar distributions. The tranformed distribution for skewed residual.sugar appears bimodal with the residual.sugar peaking around 2 gram/liter or so and again around 9 gram/liter.
##
## Two-Step Estimates
##
## Correlations/Type of Correlation:
## X fixed.acidity volatile.acidity
## X 1 Pearson Pearson
## fixed.acidity -0.2558 1 Pearson
## volatile.acidity 0.002858 -0.0227 1
## citric.acid -0.1499 0.2892 -0.1495
## residual.sugar 0.006624 0.08902 0.06429
## chlorides -0.04565 0.02309 0.07051
## free.sulfur.dioxide -0.01193 -0.0494 -0.09701
## total.sulfur.dioxide -0.162 0.09107 0.08926
## density -0.186 0.2653 0.02711
## pH -0.1158 -0.4259 -0.03192
## sulphates 0.009808 -0.01714 -0.03573
## alcohol 0.2137 -0.1209 0.06772
## quality 0.03576 -0.1137 -0.1947
## free.sulfur.dioxide_over50ppm -0.01193 -0.0494 -0.09701
## bound.sulfur.dioxide -0.1924 0.1357 0.1568
## citric.acid residual.sugar chlorides
## X Pearson Pearson Pearson
## fixed.acidity Pearson Pearson Pearson
## volatile.acidity Pearson Pearson Pearson
## citric.acid 1 Pearson Pearson
## residual.sugar 0.09421 1 Pearson
## chlorides 0.1144 0.08868 1
## free.sulfur.dioxide 0.09408 0.2991 0.1014
## total.sulfur.dioxide 0.1211 0.4014 0.1989
## density 0.1495 0.839 0.2572
## pH -0.1637 -0.1941 -0.09044
## sulphates 0.06233 -0.02666 0.01676
## alcohol -0.07573 -0.4506 -0.3602
## quality -0.009209 -0.09758 -0.2099
## free.sulfur.dioxide_over50ppm 0.09408 0.2991 0.1014
## bound.sulfur.dioxide 0.1022 0.3448 0.1938
## free.sulfur.dioxide total.sulfur.dioxide
## X Pearson Pearson
## fixed.acidity Pearson Pearson
## volatile.acidity Pearson Pearson
## citric.acid Pearson Pearson
## residual.sugar Pearson Pearson
## chlorides Pearson Pearson
## free.sulfur.dioxide 1 Pearson
## total.sulfur.dioxide 0.6155 1
## density 0.2942 0.5299
## pH -0.0006178 0.002321
## sulphates 0.05922 0.1346
## alcohol -0.2501 -0.4489
## quality 0.008158 -0.1747
## free.sulfur.dioxide_over50ppm 1 0.6155
## bound.sulfur.dioxide 0.2635 0.9225
## density pH sulphates alcohol
## X Pearson Pearson Pearson Pearson
## fixed.acidity Pearson Pearson Pearson Pearson
## volatile.acidity Pearson Pearson Pearson Pearson
## citric.acid Pearson Pearson Pearson Pearson
## residual.sugar Pearson Pearson Pearson Pearson
## chlorides Pearson Pearson Pearson Pearson
## free.sulfur.dioxide Pearson Pearson Pearson Pearson
## total.sulfur.dioxide Pearson Pearson Pearson Pearson
## density 1 Pearson Pearson Pearson
## pH -0.09359 1 Pearson Pearson
## sulphates 0.07449 0.156 1 Pearson
## alcohol -0.7801 0.1214 -0.01743 1
## quality -0.3071 0.09943 0.05368 0.4356
## free.sulfur.dioxide_over50ppm 0.2942 -0.0006178 0.05922 -0.2501
## bound.sulfur.dioxide 0.5044 0.003143 0.1357 -0.4269
## quality free.sulfur.dioxide_over50ppm
## X Pearson Pearson
## fixed.acidity Pearson Pearson
## volatile.acidity Pearson Pearson
## citric.acid Pearson Pearson
## residual.sugar Pearson Pearson
## chlorides Pearson Pearson
## free.sulfur.dioxide Pearson Pearson
## total.sulfur.dioxide Pearson Pearson
## density Pearson Pearson
## pH Pearson Pearson
## sulphates Pearson Pearson
## alcohol Pearson Pearson
## quality 1 Pearson
## free.sulfur.dioxide_over50ppm 0.008158 1
## bound.sulfur.dioxide -0.2179 0.2635
## bound.sulfur.dioxide
## X Pearson
## fixed.acidity Pearson
## volatile.acidity Pearson
## citric.acid Pearson
## residual.sugar Pearson
## chlorides Pearson
## free.sulfur.dioxide Pearson
## total.sulfur.dioxide Pearson
## density Pearson
## pH Pearson
## sulphates Pearson
## alcohol Pearson
## quality Pearson
## free.sulfur.dioxide_over50ppm Pearson
## bound.sulfur.dioxide 1
##
## Standard Errors:
## X fixed.acidity volatile.acidity
## X
## fixed.acidity 0.01336
## volatile.acidity 0.01429 0.01428
## citric.acid 0.01397 0.0131 0.01397
## residual.sugar 0.01429 0.01418 0.01423
## chlorides 0.01426 0.01428 0.01422
## free.sulfur.dioxide 0.01429 0.01426 0.01416
## total.sulfur.dioxide 0.01392 0.01417 0.01418
## density 0.0138 0.01328 0.01428
## pH 0.0141 0.0117 0.01428
## sulphates 0.01429 0.01429 0.01427
## alcohol 0.01364 0.01408 0.01422
## quality 0.01427 0.01411 0.01375
## free.sulfur.dioxide_over50ppm 0.01429 0.01426 0.01416
## bound.sulfur.dioxide 0.01376 0.01403 0.01394
## citric.acid residual.sugar chlorides
## X
## fixed.acidity
## volatile.acidity
## citric.acid
## residual.sugar 0.01416
## chlorides 0.0141 0.01418
## free.sulfur.dioxide 0.01416 0.01301 0.01414
## total.sulfur.dioxide 0.01408 0.01199 0.01372
## density 0.01397 0.004233 0.01334
## pH 0.01391 0.01375 0.01417
## sulphates 0.01423 0.01428 0.01429
## alcohol 0.01421 0.01139 0.01244
## quality 0.01429 0.01415 0.01366
## free.sulfur.dioxide_over50ppm 0.01416 0.01301 0.01414
## bound.sulfur.dioxide 0.01414 0.01259 0.01375
## free.sulfur.dioxide total.sulfur.dioxide
## X
## fixed.acidity
## volatile.acidity
## citric.acid
## residual.sugar
## chlorides
## free.sulfur.dioxide
## total.sulfur.dioxide 0.008878
## density 0.01305 0.01028
## pH 0.01429 0.01429
## sulphates 0.01424 0.01403
## alcohol 0.0134 0.01141
## quality 0.01429 0.01385
## free.sulfur.dioxide_over50ppm 0 0.008878
## bound.sulfur.dioxide 0.0133 0.00213
## density pH sulphates alcohol quality
## X
## fixed.acidity
## volatile.acidity
## citric.acid
## residual.sugar
## chlorides
## free.sulfur.dioxide
## total.sulfur.dioxide
## density
## pH 0.01416
## sulphates 0.01421 0.01394
## alcohol 0.005594 0.01408 0.01429
## quality 0.01294 0.01415 0.01425 0.01158
## free.sulfur.dioxide_over50ppm 0.01305 0.01429 0.01424 0.0134 0.01429
## bound.sulfur.dioxide 0.01065 0.01429 0.01403 0.01169 0.01361
## free.sulfur.dioxide_over50ppm
## X
## fixed.acidity
## volatile.acidity
## citric.acid
## residual.sugar
## chlorides
## free.sulfur.dioxide
## total.sulfur.dioxide
## density
## pH
## sulphates
## alcohol
## quality
## free.sulfur.dioxide_over50ppm
## bound.sulfur.dioxide 0.0133
##
## n = 4898
##
## P-values for Tests of Bivariate Normality:
## X fixed.acidity volatile.acidity
## X
## fixed.acidity 1.384e-135
## volatile.acidity 4.43e-79 8.326e-51
## citric.acid 8.099e-177 7.094e-126 3.11e-162
## residual.sugar 1.269e-153 3.961e-142 3.871e-146
## chlorides 0 0 0
## free.sulfur.dioxide 2.436e-59 9.489e-44 2.307e-50
## total.sulfur.dioxide 4.165e-65 1.731e-38 3.649e-49
## density 6.906e-101 2.053e-49 1.458e-45
## pH 2.823e-57 5.114e-36 2.379e-36
## sulphates 1.308e-56 1.076e-33 4.068e-33
## alcohol 3.053e-105 1.172e-74 1.458e-96
## quality 0 0 0
## free.sulfur.dioxide_over50ppm 2.436e-59 9.489e-44 2.307e-50
## bound.sulfur.dioxide 1.722e-70 2.943e-38 9.033e-43
## citric.acid residual.sugar chlorides
## X
## fixed.acidity
## volatile.acidity
## citric.acid
## residual.sugar 6.704e-208
## chlorides 0 0
## free.sulfur.dioxide 1.481e-110 2.279e-119 0
## total.sulfur.dioxide 2.145e-108 9.659e-122 0
## density 1.894e-132 3.89e-196 0
## pH 3.439e-101 1.085e-119 0
## sulphates 4.195e-103 2.257e-116 0
## alcohol 6.265e-186 3.624e-202 0
## quality 0 0 0
## free.sulfur.dioxide_over50ppm 1.481e-110 2.279e-119 0
## bound.sulfur.dioxide 7.878e-118 1.409e-123 0
## free.sulfur.dioxide total.sulfur.dioxide
## X
## fixed.acidity
## volatile.acidity
## citric.acid
## residual.sugar
## chlorides
## free.sulfur.dioxide
## total.sulfur.dioxide 2.231e-30
## density 1.384e-52 1.193e-28
## pH 3.012e-24 3.591e-17
## sulphates 1.06e-18 6.053e-32
## alcohol 9.643e-71 2.343e-57
## quality 0 0
## free.sulfur.dioxide_over50ppm NaN 2.231e-30
## bound.sulfur.dioxide 3.844e-25 3.543e-32
## density pH sulphates alcohol
## X
## fixed.acidity
## volatile.acidity
## citric.acid
## residual.sugar
## chlorides
## free.sulfur.dioxide
## total.sulfur.dioxide
## density
## pH 1.448e-34
## sulphates 1.796e-35 1.473e-17
## alcohol 3.223e-108 2.598e-62 3.961e-84
## quality 0 0 0 0
## free.sulfur.dioxide_over50ppm 1.384e-52 3.012e-24 1.06e-18 9.643e-71
## bound.sulfur.dioxide 5.486e-24 1.779e-15 1.816e-39 2.181e-55
## quality free.sulfur.dioxide_over50ppm
## X
## fixed.acidity
## volatile.acidity
## citric.acid
## residual.sugar
## chlorides
## free.sulfur.dioxide
## total.sulfur.dioxide
## density
## pH
## sulphates
## alcohol
## quality
## free.sulfur.dioxide_over50ppm 0
## bound.sulfur.dioxide 0 3.844e-25
From the above table, residual.sugar, total.sulfur.dioxide and chlorides do not seem to have strong correlations with quality, but they are moderately correlated with alcohol and density, which have relatively strong correlations with quality. I want to look closer at scatter plots involving quality and some other variables like alcohol, density, residual.sugar, total.sulfur.dioxide and chlorides.
Comparing alcohol to quality, the plot suffers from some overplotting. Most wines have a alcohol between 9 and 13.
## wq$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.55 10.45 10.34 11.00 12.60
## --------------------------------------------------------
## wq$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.40 10.10 10.15 10.75 13.50
## --------------------------------------------------------
## wq$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.000 9.200 9.500 9.809 10.300 13.600
## --------------------------------------------------------
## wq$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 9.60 10.50 10.58 11.40 14.00
## --------------------------------------------------------
## wq$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.60 10.60 11.40 11.37 12.30 14.20
## --------------------------------------------------------
## wq$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 11.00 12.00 11.64 12.60 14.00
## --------------------------------------------------------
## wq$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.40 12.40 12.50 12.18 12.70 12.90
The highest quality score (9) has little alcohol variance.
For quality greater than 5, when the quality score is high, the value of alcohol is also high. For quality less than 5, the relationship is the opposite.
Comparing density to quality, the plot suffers from some overplotting. Most wines have a density between 0.99 and 1.00.
## wq$quality: 3
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9911 0.9925 0.9944 0.9949 0.9969 1.0000
## --------------------------------------------------------
## wq$quality: 4
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9892 0.9926 0.9941 0.9943 0.9958 1.0000
## --------------------------------------------------------
## wq$quality: 5
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9872 0.9933 0.9953 0.9953 0.9972 1.0020
## --------------------------------------------------------
## wq$quality: 6
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9876 0.9917 0.9937 0.9940 0.9959 1.0390
## --------------------------------------------------------
## wq$quality: 7
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9906 0.9918 0.9925 0.9937 1.0000
## --------------------------------------------------------
## wq$quality: 8
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9903 0.9916 0.9922 0.9935 1.0010
## --------------------------------------------------------
## wq$quality: 9
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9896 0.9898 0.9903 0.9915 0.9906 0.9970
The highest quality score (9) has little density variance.
For quality greater than 5, when the quality score is high, the value of density is low in general.
As density increases, residual.sugar also increases. The relationship between density and residual.sugar appears to be linear.
As density increases, chlorides also increases. The relationship between density and chlorides appears to be linear.
As alcohol increases, total.sulfur.dioxide decreases. The relationship between alcohol and bound.sulfur.dioxide appears to be linear.
As alcohol increases, chlorides decreases. The relationship between alcohol and chlorides appears to be linear.
As alcohol increases, density decreases. The relationship between alcohol and density appears to be linear.
Quality correlates with alcohol and density. Density correlates strongly with alcohol.
In general, high quality score often accompanied with high alcolhol and low density. As alcohol inscreases, density decreases. The relationship between alcohol and density appears to be linear.
The highest quality score (9) has little alcohol variance and density variance.
As density increases, residual.sugar also increases. The relationship between density and residual.sugar appears to be linear. The relationship between density and chlorides is similar.
As alcohol increases, bound.sulfur.dioxide decreases. The relationship between alcohol and bound.sulfur.dioxide appears to be linear. The relationship between alcohol and chlorides is also similar.
The density is positively and strongly correlated with residual.sugar. The alcohol negatively correlates with bound.sulfur.dioxide and chlorides but this relationships are less strongly than density and residual.sugar. As a result, residual.sugar, bound.sulfur.dioxide and chlorides could be used in a model to predict the quality of wines.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
A quality score of 5 or 6 often appears at an alcohol value between 9 to 10. Quality socre of 7 or 8 often appears at an alcohol value between 12 and 13. Wines with very low quality or very hign quality are very rare.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
A quality score of 5 or 6 often appears at a density value between 300 to 600. Quality socre of 7 or 8 often appears at an alcohol value between 100 and 250. Wines with very low quality or very hign quality are very rare.
High quality wines tend to have the high alcohol value and low density value. Wines with low total.sulfur.dioxide and low chlorides values are likely to get high quality. The variance across the groups seems to be about the same with Fair cut diamonds having the least variation for the middle 50% of diamonds.
Holding densisty constant, wines with lower residual.sugar are almost always get lower score in quality than wines with high residual.sugar value (worst quality is 3 and best quality is 9).
Wines with high sugar and low salt (chlorides) tends to get high quality score. This resonates with me because I think the flavor of wines does influence wine quality.
Wines with low bound.sulfur.dioxide and high alcohol value tends to get high quality score. Although SO2 is mostly undetectable in wine, it still influence people when they judge wine quality.
The distribution of residual.sugar appears to be bimodal on log scale, perhaps due to the demand of wines and buyers purchasing in two different ranges of sweetness. Some prefer high sweetness, while others prefer low sweetness.
There is a negative relationship between density and alcohol. As the alcohol increase, the density decreases. This relationship appears to be linear. That does make sense since the density of water is close to that of water depending on the percent alcohol and sugar content. This can also explain the phenomenon that high quality score always accompanied by low density and high alcohol.
The plot indicates that a linear model could be constructed to predict the quality as the outcome variable and residual.sugar as the predictor variable. Holding densisty constant, wines with lower residual.sugar are almost always get lower score in quality than wines with high residual.sugar value (worst quality is 3 and best quality is 9).
The wines data set contains information on 4898 white wines with 13 variables. I started by understanding the individual variables in the data set, and then I explored interesting questions and leads as I continued to make observations on plots. Eventually, I explored the price of wines across many variables and found a linear relationship between sweetness and quality.
There was a clear trend between the alcohol and density of a wine and its quality. The relationship between alcohol and density also prove this trend. I was surprised that even though SO2 (bound.sulfur.dioxide) is mostly undetectable in wine, it still somehow influence people’s judgement about wine quality.
I struggled understanding the left skrew in residual.sugar histogram. I use log10 to transform the residual.sugar value, and then I found that residual.sugar appears bimodal with value peaking around 2 or so and again at 9 or so. Then I realized that it does make sense because it is due to people’s preference about wine flavor.
There are some limitations existing in the source of this data. The majority of wines are scored 5 to 7. There is a lack of very high quality and very low quality data in this dataset. If I use this dataset to make a model to predict wine quality, the result might not be accurate enough. Given that the wines date to 2009, perhaps more features or other ingredients that will influence wine quality are discovered. The factors provided in this dataset is insufficient. To investigate this data further, I would examine the relationship among other features and make a predict model. I would be interested in testing the linear model to predict current wine quality and to determine to what extent the model is accurate at predicting quality score. A more recent dataset would be better to make predictions of wine quality, and comparisons might be made between the other linear models to see if other variables may account for wine quality.